Data and Workload Distribution in a Multithreaded Architecture
نویسندگان
چکیده
Matching data distribution to workload distribution is important to improve the performance of distributedmemory multiprocessors. While data and workload distribution can be tailored to fit a particular problem to a particular distributed-memory architecture, it is often difficult to do so for various reasons including complexity of address computation, runtime data movement, and irregular resource usage. This report presents our study on multithreading for distributed-memory multiprocessors. Specifically, we investigate the effects of multithreading on data distribution and workload distribution with variable thread granularity. Various types of workload distribution strategies are defined along with thread granularity. Several types of data distribution strategies are investigated. These include row-wise cyclic, k-way partial-row cyclic, and blocked distribution. To investigate the performance of multithreading, two problems are selected: highly sequential Gaussian Elimination with Partial Pivoting and highly parallel Matrix Multiplication. Execution results on the 80-processor EM-4 distributed-memory multiprocessor indicate that multithreading can offset the loss that is due to the mismatch between data distribution and workload distribution even for sequential and irregular problems while giving high absolute performance. 1 Dept. of Computer and Information Science, New Jersey Institute of Technology, Newark, NJ 07102-1982, [email protected] 2 Computer Architecture Section, Electrotechnical Laboratory, 1-1-4 Umezono, Tsukuba-shi, Ibaraki 305, Japan, [email protected] 3 Dept. of EE-Systems, EEB336, University of Southern California, Los Angeles, California 90089-2563, {nyoo, gaudiot}@usc.edu
منابع مشابه
Measurement and Modeling of EARTH-MANNA Multithreaded Architecture
In this paper, we develop and apply an analytical model to predict the performance of McGill's EARTH-MANNA multithreaded multiprocessor system. The performance model is evolved from a closed queuing network model for multithreaded architectures reported in our earlier work [17]. In this work, we extend the original model to account for the complications due to realistic subsystem interactions a...
متن کاملLooking for Novel Ways to Obtain Fair Measurements in Multithreaded Architectures
Current methodologies do not provide representative results for the evaluation of multithreaded architectures, which could lead to unfair or misleading conclusions. This paper presents FAME, a novel evaluation methodology aimed to fairly measure the performance of multithreaded processors. FAME reexecutes all threads in a multithreaded workload until all of them are fairly represented in the fi...
متن کاملSpeculative Precomputation
Current processors are based on a multithreaded architecture. Simultaneous Multithreading (SMT) techniques are used to increase instruction throughput under a multiprogramming workload; however, it does not improve performance when only a single thread is executing. This communication explores Speculative Precomputation, a technique that uses idle thread contexts in a multithreaded architecture...
متن کاملMeasuring the Performance of Multithreaded Processors
Nowadays, multithreaded architectures are becoming more and more popular. In fact, many processor vendors have already shipped processors with multithreaded features. Regardless of this push on multithreaded processors, still today there is not a clear procedure that defines how to measure the behavior of a multithreaded processor. This paper presents FAME, a new evaluation methodology aimed to...
متن کاملPerformance Characterization of a Multithreaded Architecture: Where Are the Beneets?
Multithreaded architectures hold the promise of high performance through an overlap of computation and communication. This paper explores how the overlap in multithreaded execution aaects the performance of processor, memory, and network subsystems; what are the critical parameters to ensure high processor performance; and what is the performance impact of optimizations of the workload and arch...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Parallel Distrib. Comput.
دوره 40 شماره
صفحات -
تاریخ انتشار 1997